

<!DOCTYPE html>
<html lang="zh-CN">

<head><meta name="generator" content="Hexo 3.9.0">
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>2.1开始第一个爬虫程序 - 杨云召 | 博客</title>
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
  <meta name="google" content="notranslate">

  
  
  <meta name="description" content="1. 安装IDE以及hello world
一个优秀的..."> 
  
  <meta name="author" content="杨云召（Flywith24）"> 

  
    <link rel="icon" href="/images/icons/favicon-16x16.png" type="image/png" sizes="16x16">
  
  
    <link rel="icon" href="/images/icons/favicon-32x32.png" type="image/png" sizes="32x32">
  
  
    <link rel="apple-touch-icon" href="/images/icons/apple-touch-icon.png" sizes="180x180">
  
  
    <meta rel="mask-icon" href="/images/icons/stun-logo.svg" color="#333333">
  
  
    <meta rel="msapplication-TileImage" content="/images/icons/favicon-144x144.png">
    <meta rel="msapplication-TileColor" content="#000000">
  

  <link rel="stylesheet" href="/css/style.css">

  
  <link rel="stylesheet" href="//at.alicdn.com/t/font_1445822_hhodbqn7tit.css">
  

  
  
  <link rel="stylesheet" href="https://cdn.bootcss.com/fancybox/3.5.7/jquery.fancybox.min.css">
  

  
  
  <link rel="stylesheet" href="https://cdn.bootcss.com/highlight.js/9.18.1/styles/xcode.min.css">
  

  <script>
    var CONFIG = window.CONFIG || {};
    var ZHAOO = window.ZHAOO || {};
    CONFIG = {
      isHome: false,
      fancybox: true,
      pjax: false,
      lazyload: {
        enable: true,
        loadingImage: '/images/theme/loading.gif',
      },
      donate: {
        enable: true,
        alipay: 'https://gitee.com/flywith24/Album/raw/master/img/20201015113301.jpg',
        wechat: 'https://gitee.com/flywith24/Album/raw/master/img/20201015112814.png'
      },
      motto: {
        api: '',
        default: '不人云亦云，只求接近真相.'
      },
      galleries: {
        enable: true
      },
      fab: {
        enable: true,
        alwaysShow: false
      },
      carrier: {
        enable: true
      },
      daovoice: {
        enable: true
      }
    }
  </script>

  

  
</head></html>
<body class="lock-screen">
  <div class="loading"></div>
  


<nav class="navbar">
  <div class="left"></div>
  <div class="center">2.1开始第一个爬虫程序</div>
  <div class="right">
    <i class="iconfont iconmenu j-navbar-menu"></i>
  </div>
</nav>

  <nav class="menu">
  <div class="menu-wrap">
    <div class="menu-close">
      <i class="iconfont iconbaseline-close-px"></i>
    </div>
    <ul class="menu-content">
      
      
      
      
      <li class="menu-item"><a href="/ " class="underline"> 首页</a></li>
      
      
      
      
      <li class="menu-item"><a href="/archives " class="underline"> 归档</a></li>
      
      
      
      
      <li class="menu-item"><a href="/tags " class="underline"> 标签</a></li>
      
      
      
      
      <li class="menu-item"><a href="/categories " class="underline"> 分类</a></li>
      
      
      
      
      <li class="menu-item"><a href="/about " class="underline"> 关于</a></li>
      
    </ul>
    <div class="menu-copyright"><p>Copyright© 2017 - 2020 ❤️ Flywith24</p></div>
  </div>
</nav>
  <main id="main">
  <div class="container" id="container">
    <article class="article">
  <div class="wrap">
    <section class="head">
  <img   class="lazyload" data-original="https://gitee.com/flywith24/Album/raw/master/img/20201015085629.png" src=""  draggable="false">
  <div class="head-mask">
    <h1 class="head-title">2.1开始第一个爬虫程序</h1>
    <div class="head-info">
      <span class="post-info-item"><i class="iconfont iconcalendar"></i>六月 18, 2017</span
        class="post-info-item">
      
      <span class="post-info-item"><i class="iconfont iconfont-size"></i>1324</span>
    </div>
  </div>
</section>
    <section class="main">
      <section class="content">
        <p><a href="https://www.jetbrains.com/pycharm/" target="_blank" rel="noopener">点击进入下载链接</a></p>
<p>点击download选择相应平台及版本，我这里选择的是社区版。</p>
<p><img class="lazyload" data-original="http://upload-images.jianshu.io/upload_images/3155702-752e691985633d3e.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240" src="/2017/06/18/2-1-first-program/gif;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVQImWNgYGBgAAAABQABh6FO1AAAAABJRU5ErkJggg==" alt="下载界面"></p>
<p>安装过程，切换主题，调整字体之类的都跟跟IDEA类似，不需赘述。</p>
<p>那么开始我们的hello world程序吧。<br>新建一个Python file ，然后写入</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs undefined">print (&quot;hello world&quot;)<br></code></pre></td></tr></table></figure>

<p>第一次运行在工作区右击 run即可</p>
<p><img class="lazyload" data-original="http://upload-images.jianshu.io/upload_images/3155702-4ffc1a9c05699113.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240" src="/2017/06/18/2-1-first-program/gif;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVQImWNgYGBgAAAABQABh6FO1AAAAABJRU5ErkJggg==" alt="右击运行"><br>之后就会在Toolbar上显示run按钮了</p>
<p><img class="lazyload" data-original="http://upload-images.jianshu.io/upload_images/3155702-a540d17991f6282d.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240" src="/2017/06/18/2-1-first-program/gif;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVQImWNgYGBgAAAABQABh6FO1AAAAABJRU5ErkJggg==" alt="运行截图"><br>很棒有木有</p>
<h3 id="2-1-先爬他一个网页下来"><a href="#2-1-先爬他一个网页下来" class="headerlink" title="2.1 先爬他一个网页下来"></a>2.1 先爬他一个网页下来</h3><p>敲入如下代码</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs undefined">import urllib2<br><br>response = urllib2.urlopen(&quot;http://www.baidu.com&quot;)<br>print response.read()<br></code></pre></td></tr></table></figure>

<p>点击运行，我们可以得到结果</p>
<p><img class="lazyload" data-original="http://upload-images.jianshu.io/upload_images/3155702-165b12f5711bcd47.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240" src="/2017/06/18/2-1-first-program/gif;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVQImWNgYGBgAAAABQABh6FO1AAAAABJRU5ErkJggg==" alt></p>
<p>这个网页的源码被我们爬下来了，是不是很简单！</p>
<h3 id="2-2-分析代码"><a href="#2-2-分析代码" class="headerlink" title="2.2 分析代码"></a>2.2 分析代码</h3><p>下面我们来分析下这段代码</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs undefined">import urllib2<br></code></pre></td></tr></table></figure>

<p>urllib2库是学习Python爬虫最基本的模块，利用这个模块我们可以得到网页的内容，并对内容用正则表达式提取分析，得到我们想要的结果</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs undefined">response = urllib2.urlopen(&quot;http://www.baidu.com&quot;)<br></code></pre></td></tr></table></figure>

<p>首先我们调用的是urllib2库里面的urlopen方法，传入一个URL，这个网址是百度首页，协议是HTTP协议，urlopen一般接受三个参数，它的参数如下：</p>
<p><img class="lazyload" data-original="http://upload-images.jianshu.io/upload_images/3155702-f03abc16265de8dc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240" src="/2017/06/18/2-1-first-program/gif;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVQImWNgYGBgAAAABQABh6FO1AAAAABJRU5ErkJggg==" alt="urlopen参数"></p>
<p>第一个参数url即为URL，第二个参数data是访问URL时要传送的数据，第三个timeout是设置超时时间。</p>
<p>第二三个参数是可以不传送的，data默认为空None，timeout默认为 socket._GLOBAL_DEFAULT_TIMEOUT</p>
<p>第一个参数URL是必须要传送的，在这个例子里面我们传送了百度的URL，执行urlopen方法之后，返回一个response对象，返回信息便保存在这里面。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs undefined">print response.read()<br></code></pre></td></tr></table></figure>

<p>response对象有一个read方法，可以返回获取到的网页内容</p>
<h3 id="2-3-构造Request"><a href="#2-3-构造Request" class="headerlink" title="2.3 构造Request"></a>2.3 构造Request</h3><p>上面的urlopen参数可以传入一个request请求,它其实就是一个Request类的实例，构造时需要传入Url,Data等等的内容。我们可以将上面的代码改写为</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs undefined">import urllib2<br><br>request = urllib2.Request(&quot;http://www.baidu.com&quot;)<br>response = urllib2.urlopen(request)<br>print response.read()<br></code></pre></td></tr></table></figure>

<p>运行结果是完全一样的，只不过中间多了一个request对象，推荐大家这么写，因为在构建请求时还需要加入好多内容，通过构建一个request，服务器响应请求得到应答，这样显得逻辑上清晰明确。</p>
      </section>
      <section class="extra">
        
        <ul class="copyright">
  
  <li><strong>本文作者：</strong>杨云召（Flywith24）</li>
  <li><strong>本文链接：</strong><a href="https://flywith24.gitee.io/2017/06/18/2-1-first-program/index.html">https://flywith24.gitee.io/2017/06/18/2-1-first-program/index.html</a></li>
  <li><strong>版权声明：</strong>本博客所有文章均采用<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/deed.zh"
      rel="external nofollow" target="_blank"> BY-NC-SA </a>许可协议，转载请注明出处！</li>
  
</ul>
        
        
        <section class="donate">
  <div class="qrcode">
    <img   class="lazyload" data-original="https://gitee.com/flywith24/Album/raw/master/img/20201015113301.jpg" src="" >
  </div>
  <div class="icon">
    <a href="javascript:;" id="alipay"><i class="iconfont iconalipay"></i></a>
    <a href="javascript:;" id="wechat"><i class="iconfont iconwechat-fill"></i></a>
  </div>
</section>
        
        
  <ul class="tag-list"><li class="tag-list-item"><a class="tag-list-link" href="/tags/python/">python</a></li><li class="tag-list-item"><a class="tag-list-link" href="/tags/爬虫/">爬虫</a></li></ul>

        
<nav class="nav">
  
    <a href="/2017/06/18/2-2-grade/"><i class="iconfont iconleft"></i>2.2从教务系统查询成绩并计算绩点——山东建筑大学为例</a>
  
  
    <a href="/2017/06/05/1-1-use-function/">1.1使用函数<i class="iconfont iconright"></i></a>
  
</nav>

      </section>
      
      <section class="comments">
  
  <div class="btn" id="comments-btn">查看评论</div>
  
  
<div id="valine"></div>
<script defer src="//unpkg.com/valine/dist/Valine.min.js"></script>
<script>
  window.onload = function () {
    var loadValine = function () {
      new Valine({
        el: '#valine',
        app_id: "SHA3gzCcxFlqaPcGRnX4UgET-gzGzoHsz",
        app_key: "OYo7ITS4AzTvBUvb1yIY8oaf",
        placeholder: "期待你的留言",
        avatar: "mp",
        pageSize: "10",
        lang: "zh-CN",
      });
    }
    if ( true ) {
      $("#comments-btn").on("click", function () {
        $(this).hide();
        loadValine();
      });
    } else {
      loadValine();
    }
  };
</script>

</section>
      
    </section>
  </div>
</article>
  </div>
</main>
  <footer class="footer">
  <div class="footer-social">
    
    
    
    
    
    <a href="tencent://message/?Menu=yes&uin=1032367864 " target="_blank" onMouseOver="this.style.color= '#12B7F5'"
      onMouseOut="this.style.color='#33333D'">
      <i class="iconfont footer-social-item  iconQQ "></i>
    </a>
    
    
    
    
    
    <a href="javascript:; " target="_blank" onMouseOver="this.style.color= '#8bc34a'"
      onMouseOut="this.style.color='#33333D'">
      <i class="iconfont footer-social-item  iconwechat-fill "></i>
    </a>
    
    
    
    
    
    <a href="https://flywith24.gitee.io/about/ " target="_blank" onMouseOver="this.style.color= '#d32f2f'"
      onMouseOut="this.style.color='#33333D'">
      <i class="iconfont footer-social-item  iconheart "></i>
    </a>
    
    
    
    
    
    <a href="https://github.com/Flywith24 " target="_blank" onMouseOver="this.style.color= '#24292E'"
      onMouseOut="this.style.color='#33333D'">
      <i class="iconfont footer-social-item  icongithub-fill "></i>
    </a>
    
    
    
    
    
    <a href="mailto:youngyunzhao@163.com " target="_blank" onMouseOver="this.style.color='#FFBE5B'"
      onMouseOut="this.style.color='#33333D'">
      <i class="iconfont footer-social-item  iconmail"></i>
    </a>
    
  </div>
  <div class="footer-copyright"><p>Copyright© 2017 - 2020 ❤️ Flywith24</p></div>
</footer>
  
      <div class="fab fab-plus">
    <i class="iconfont iconplus"></i>
  </div>
  
  <div class="fab fab-daovoice">
    <i class="iconfont iconcomment"></i>
  </div>
  
  <div class="fab fab-up">
    <i class="iconfont iconcaret-up"></i>
  </div>
  
<script src="/live2dw/lib/L2Dwidget.min.js?094cbace49a39548bed64abff5988b05"></script><script>L2Dwidget.init({"pluginRootPath":"live2dw/","pluginJsPath":"lib/","pluginModelPath":"assets/","tagMode":false,"debug":false,"model":{"jsonPath":"/live2dw/assets/shizuku.model.json"},"display":{"position":"left","width":150,"height":300},"mobile":{"show":"tfalse"},"log":false});</script></body>

<script src="https://cdn.bootcss.com/jquery/3.4.1/jquery.min.js"></script>




<script src="https://cdn.bootcdn.net/ajax/libs/jquery.lazyload/1.9.1/jquery.lazyload.min.js"></script>




<script src="https://cdn.bootcss.com/fancybox/3.5.7/jquery.fancybox.min.js"></script>




<script src="/js/utils.js"></script>
<script src="/js/modules.js"></script>
<script src="/js/zui.js"></script>
<script src="/js/script.js"></script>




<script>
  (function (i, s, o, g, r, a, m) {
    i["DaoVoiceObject"] = r;
    i[r] = i[r] || function () {
      (i[r].q = i[r].q || []).push(arguments)
    }, i[r].l = 1 * new Date();
    a = s.createElement(o), m = s.getElementsByTagName(o)[0];
    a.async = 1;
    a.src = g;
    a.charset = "utf-8";
    m.parentNode.insertBefore(a, m)
  })(window, document, "script", ('https:' == document.location.protocol ? 'https:' : 'http:') +
    "//widget.daovoice.io/widget/0f81ff2f.js", "daovoice")
  daovoice('init', {
    app_id: "7785620b"
  }, {
    launcher: {
      disableLauncherIcon: true,
    },
  });
  daovoice('update');
</script>



<script>
  (function () {
    var bp = document.createElement('script');
    var curProtocol = window.location.protocol.split(':')[0];
    if (curProtocol === 'https') {
      bp.src = 'https://zz.bdstatic.com/linksubmit/push.js';
    } else {
      bp.src = 'http://push.zhanzhang.baidu.com/push.js';
    }
    var s = document.getElementsByTagName("script")[0];
    s.parentNode.insertBefore(bp, s);
  })();
</script>


<script>
  var _hmt = _hmt || [];
  (function () {
    var hm = document.createElement("script");
    hm.src = "https://hm.baidu.com/hm.js?4c204d8bc027a0455b5fc642ac334ca8";
    var s = document.getElementsByTagName("script")[0];
    s.parentNode.insertBefore(hm, s);
  })();
</script>










</html>